Distributional Thesaurus vs. WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment1

نویسندگان

  • Hiram Calvo
  • Alexander Gelbukh
  • Adam Kilgarriff
چکیده

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic identification of word sense change across different timescales

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subse...

متن کامل

Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri

Much attention has been given to the impact of informativeness and similarity measures on distributional thesauri. We investigate the effects of context filters on thesaurus quality and propose the use of cooccurrence frequency as a simple and inexpensive criterion. For evaluation, we measure thesaurus agreement with WordNet and performance in answering TOEFL-like questions. Results illustrate ...

متن کامل

Measuring Semantic Distance using Distributional Profiles of Concepts

Automatic measures of semantic distance can be classified into two kinds: (1) those, such as WordNet, that rely on the structure of manually created lexical resources and (2) those that rely only on co-occurrence statistics from large corpora. Each kind has inherent strengths and limitations. Here we present a hybrid approach that combines corpus statistics with the structure of a Roget-like th...

متن کامل

That's sick dude!: Automatic identification of word sense change across different timescales

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subse...

متن کامل

Disambiguating Noun Groupings with Respect to Wordnet Senses

Word groupings useful for language processing tasks are increasingly available, as thesauri appear online, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word senses, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns — the kind of data one finds i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004